In this session, you will learn:

  1. What are alternative ways to create network structures.
  2. What are different options to visualize networks and highlight properties.
  3. How to analyse multi-modal networks.

Types of networks

We up to now already talked about different ways how networks can be constructed. Up to now, we mainly focussed on:

  • Interaction between entities
  • Co-occurence

However, network analysis and modelling is also fully consistent with other structures, which are often a natural outcome of supervised or unsupervised ML exercises:

  • Similarities
  • Hirarchies (tree-structures)

Similarity networks

Since similarity is a relational property between entities, similarity matrices obviously can be modeled as a network. Lets illustrate that at the classican mtcars example.

mtcars %>% head() 

Whe could first run a PCA to reduce the dimensionality of the numerical data.

cars_pca <- mtcars[,c(1:7,10,11)] %>% 
  drop_na() %>%
  prcomp(center = TRUE , scale = TRUE)

Next, we could create a distance matrice (using the dist()) function.

cars_dist <- cars_pca$x %>% dist(method = "euclidean") 

La voila. Such a distance matrix representas a relational structure and can be modelled as a network.

g <- cars_dist %>% 
  as.matrix() %>%
  as_tbl_graph() 
g
## # A tbl_graph: 32 nodes and 992 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 32 x 1 (active)
##   name             
##   <chr>            
## 1 Mazda RX4        
## 2 Mazda RX4 Wag    
## 3 Datsun 710       
## 4 Hornet 4 Drive   
## 5 Hornet Sportabout
## 6 Valiant          
## # ... with 26 more rows
## #
## # Edge Data: 992 x 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     2  0.408
## 2     1     3  2.57 
## 3     1     4  3.38 
## # ... with 989 more rows

Since the network is based on a distance matrix, we would like to reverse that to get edges representing similarity. Since similarity structures are usually fully connected networks, we probably also want to create some sparsity by deleting lower quantile edge weights.

g <- g %E>%
  mutate(weight = max(weight) - weight) %>%
  filter(weight >= weight %>% quantile(0.75)) %N>%
  filter(!node_is_isolated()) 

Lets take a look!

g %>% ggraph(layout = "kk") + 
  geom_node_point() + 
  geom_edge_link(aes(size = weight), alpha = 0.25) +
  geom_node_text(aes(label = name)) +
  theme_graph()

Hierarchy (tree) networks

Hirarchical structures are obviously also relational. The difference is, that the connectivity structure tends to be constraint to other levels.

create_tree(20, 3) %>% 
    mutate(leaf = node_is_leaf(), root = node_is_root()) %>% 
    ggraph(layout = 'tree') +
    geom_edge_diagonal() +
    geom_node_point(aes(filter = leaf), colour = 'forestgreen', size = 10) +
    geom_node_point(aes(filter = root), colour = 'firebrick', size = 10) +
    theme_graph()

In adittion to real life exaples such as organigrams, evolutionary trees etc., many ML models result in tree-structures (eg. decision trees).

We will at our car example execute a hierarchical clustering, which leads to a tree structure (visualized in the dendogram).

cars_hc <- cars_dist %>%
  hclust(method = "ward.D2")

Again, this structure can be directly transfered to a graph object.

g <- cars_hc %>% as_tbl_graph()
g
## # A tbl_graph: 63 nodes and 62 edges
## #
## # A rooted tree
## #
## # Node Data: 63 x 4 (active)
##   height leaf  label         members
##    <dbl> <lgl> <fct>           <int>
## 1   0    TRUE  Porsche 914-2       1
## 2   0    TRUE  Lotus Europa        1
## 3   1.62 FALSE ""                  2
## 4   0    TRUE  Honda Civic         1
## 5   0    TRUE  Fiat X1-9           1
## 6   0    TRUE  Fiat 128            1
## # ... with 57 more rows
## #
## # Edge Data: 62 x 2
##    from    to
##   <int> <int>
## 1     3     1
## 2     3     2
## 3     8     6
## # ... with 59 more rows
g %>% ggraph(layout = 'dendrogram') + 
  geom_edge_diagonal() +
  geom_node_point() +
  geom_node_text(aes(filter = leaf, label = label), angle=90, hjust=1, nudge_y=-0.1) + 
  theme_graph() + 
  ylim(-.6, NA) 

Network Visualization

Visualize what, and why?

The main concern in designing a network visualization is the purpose it has to serve. What are the structural properties that we want to highlight? What are the key concerns we want to address?

Network maps are far from the only visualization available for graphs - other network representation formats, and even simple charts of key characteristics, may be more appropriate in some cases.

In network maps, as in other visualization formats, we have several key elements that control the outcome. The major ones are color, size, shape, and position.

g <- as_tbl_graph(highschool, directed = TRUE)
p_load(randomNames)

g <- g %E>%
  mutate(weight = sample(1:5, n(), replace = TRUE),
         year = year %>% as.factor()) %N>%
  mutate(class = sample(LETTERS[1:3], n(), replace = TRUE),
         gender = rbinom(n = n(), size = 1, prob = 0.5) %>% as.logical(),
         label = randomNames(gender = gender, name.order = "first.last"),
         cent_dgr = centrality_degree(mode = "in"),
         community = group_edge_betweenness(weights = weight, directed = TRUE) %>% as.factor()) %N>%
  filter(!node_is_isolated()) %E>%
  mutate(community_from = .N()$community[from])

Node Visualization

Nodes in a network are the entities that are connected. Sometimes these are also referred to as vertices. While the nodes in a graph are the abstract concepts of entities, and the layout is their physical placement, the node geoms are the visual manifestation of the entities.

Node positions

Conceptually one can simply think of it in terms of a scatter plot — the layout provides the x and y coordinates, and these can be used to draw nodes in different ways in the plotting window. Actually, due to the design of ggraph the standard scatterplot-like geoms from ggplot2 can be used directly for plotting nodes:

g %>%
  ggraph(layout = "nicely") + 
    geom_point(aes(x = x, y = y))

The reason this works is that layouts (about which we talk in a moment) return a data.frame of node positions and metadata and this is used as the default plot data:

g %>% create_layout(layout = "nicely") %>% head()

While usage of the default ggplot2 is fine, ggraph comes with its own set of node geoms (geom_node_*()). They by defaul already inherit the layout x and y coordinates, and come with extra features for network visualization.

g %>% ggraph(layout = 'nicely') + 
  geom_node_point()

Usually (but not always) when visualizing a network, we are interested in the connectivity structure as expressed by the interplay between nodes and edges. So, lets also plot the edges (the geometries from the geom_edge_* family, about which we talk in a moment)

g %>% ggraph(layout = 'nicely') + 
  geom_node_point() + 
  geom_edge_link(alpha = 0.25) 

Size

g %>% ggraph(layout = 'nicely') + 
  geom_edge_link(alpha = 0.25) +
  geom_node_point(aes(size = cent_dgr)) 

Color

g %>% ggraph(layout = 'nicely') + 
  geom_edge_link(alpha = 0.25) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community)) 

Shapes

shapes()
##  [1] "circle"     "crectangle" "csquare"    "none"       "pie"       
##  [6] "raster"     "rectangle"  "sphere"     "square"     "vrectangle"
g %>% ggraph(layout = 'nicely') + 
  geom_edge_link(alpha = 0.25) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community,
                      shape = gender)) +
  theme_graph() 

Labels

With the geom_node_text geometry, we can also ad labels to the node. They are subject to common aestetics.

g %>% ggraph(layout = 'nicely') + 
  geom_edge_link(alpha = 0.25) +
  geom_node_text(aes(label = label, 
                     size = cent_dgr))

In large graphs, plotting labels can appear messy, so it might make sense to only focus on important nodes to label

g %>% ggraph(layout = 'nicely') + 
  geom_edge_link(alpha = 0.25) +
  geom_edge_link(alpha = 0.25) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community,
                      shape = gender)) +
  geom_node_text(aes(label = label, 
                     filter = cent_dgr >= cent_dgr %>% quantile(0.8)), 
                 repel = TRUE) +
  theme_graph() 

Edge Visualization

So, now that we captured nodes, lets see how we can highlight aspects of edges, which are visualized with the geometries of the geom_edge_* family.

Size

g %>% ggraph(layout = 'nicely') + 
  geom_edge_link(aes(size = weight), alpha = 0.25) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community,
                      shape = gender),
                  show.legend = FALSE) +
  theme_graph() 

Color

Notice, since we want to represent the colors of potentially multiple edges between a node pair, I now use the geom_edge_fan geometry.

g %>% ggraph(layout = 'nicely') + 
  geom_edge_fan(aes(size = weight,
                     color = year), alpha = 0.25) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community,
                      shape = gender),
                  show.legend = FALSE) +
  theme_graph() 

Density

g %>% ggraph(layout = 'nicely') + 
  geom_edge_link(alpha = 0.1) +
  geom_edge_density(aes(fill = year)) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community,
                      shape = gender),
                  show.legend = FALSE) +
  theme_graph() 

Directionality

The easiestb way to express directionality is by defining the arrow(), which comes with own aestetics.

g %>% ggraph(layout = 'nicely') + 
  geom_edge_fan(aes(size = weight,
                    color = year,
                    shape = year), 
                arrow = arrow(type = "closed", length = unit(2, "mm")),
                start_cap = circle(1, "mm"),
                end_cap = circle(1, "mm"),
                alpha = 0.5) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community, 
                      shape = gender),
                  show.legend = FALSE) +
  theme_graph() 

Another nice trick is to work with alphas or colors, which change between start and end node.

g %>%
  ggraph(layout = 'nicely') + 
  geom_edge_fan(aes(size = weight,
                    color = community_from, # Notice that
                    shape = year,
                    alpha = stat(index)) # Notice that
                ) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community,
                      shape = gender),
                  show.legend = FALSE) +
  theme_graph() + 
  scale_edge_alpha("Edge direction", guide = "edge_direction")

Layouts

Ordinary graph style

pacman::p_load(ggpubr)
layout_list <- c("randomly", "circle", "grid", "fr", "kk", "graphopt")

g_list <- list(NULL)
for(i in 1:length(layout_list)){
  g_list[[i]] <-g %>% 
    ggraph(layout = layout_list[i]) + 
    geom_edge_fan(aes(size = weight)) +
    geom_node_point(aes(size = cent_dgr, 
                        color = community),
                    show.legend = FALSE) +
    theme_graph() +
    labs(title = paste("Layout:", layout_list[i], sep = " "))
}

ggarrange(plotlist = g_list, nrow = 2, ncol = 3, common.legend = TRUE, legend = "bottom")

Arcs and circles

# An arc diagram
g %>% ggraph(layout = 'linear') + 
  geom_edge_arc(aes(colour = community_from)) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community),
                  show.legend = FALSE) +
    theme_graph() 

# An arc diagram
g %>% ggraph(layout = "linear", circular = TRUE) + 
  geom_edge_arc(aes(colour = community_from)) +
  geom_node_point(aes(size = cent_dgr, 
                      color = community),
                  show.legend = FALSE) +
    theme_graph() 

Hive plots

A hive plot, while still technically a node-edge diagram, is a bit different from the rest as it uses information pertaining to the nodes, rather than the connection information in the graph. This means that hive plots, to a certain extent are more interpretable as well as less vulnerable to small changes in the graph structure. They are less common though, so use will often require some additional explanation.

g %>%
  ggraph(layout = "hive", axis = "community") + 
  geom_edge_hive(aes(colour = factor(year))) + 
  geom_axis_hive(aes(colour = community), size = 2, label = FALSE) + 
  coord_fixed() +
  theme_graph()

Hirarchies

flare$vertices %>% head()
flare$edges %>% head()
g <- tbl_graph(flare$vertices, flare$edges)
# An icicle plot
g %>% ggraph('partition') + 
  geom_node_tile(aes(fill = depth), size = 0.25)

# A sunburst plot
g %>% ggraph('partition', circular = TRUE) + 
  geom_node_arc_bar(aes(fill = depth), size = 0.25) + 
  coord_fixed()

g %>% ggraph('circlepack') + # , weight = size
  geom_node_circle(aes(fill = depth), size = 0.25, n = 50) + 
  coord_fixed()

g %>% ggraph('tree') + 
  geom_edge_diagonal()

rm(list=ls())

Multi-Modal Networks

Now its time to talk about an interesting type of networks, multi-modal. This means, a network has several “modes”, meaning connects entities on different conceptual levels. The most commone one is a 2-mode (or bipartite) network. Examples could be an Author \(\rightarrow\) Paper, Inventor \(\rightarrow\) Patent, Member \(\rightarrow\) Club network. Here, the elements in the different modes represent different things.

We can alalyse them in sepperation (and sometimes we should), but often its helpful to “project”" them onto one mode. Here, we create a node in one mode by joint association with another mode.

[](https://www.dropbox.com/s/e4vnq7kh24pyu0t/networks_2mode.png?dl=1{width=500px}

While that sounds simple, it can be a very powerful technique, as I will demonstrate now.

#data <- whigs %>% as_tibble()
#data %>% head()
g <- create_bipartite(20, 5, directed = FALSE, mode = "out")
g
## # A tbl_graph: 25 nodes and 100 edges
## #
## # A bipartite simple graph with 1 component
## #
## # Node Data: 25 x 1 (active)
##   type 
##   <lgl>
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## # ... with 19 more rows
## #
## # Edge Data: 100 x 2
##    from    to
##   <int> <int>
## 1     1    21
## 2     1    22
## 3     1    23
## # ... with 97 more rows
g %>% ggraph("bipartite") + 
  geom_edge_link() + 
  theme_graph()

Case study: Bibliographic networks

Basics

Lets talk about bibliographic networks. In short, that are networks between documents which cite each others. That can be (commonly) academic publications, but also patents or policy reports. Conceptually, we can see them as 2 mode networks, between articles and their reference. That helps us to apply some interesting metrics, such as:

  • direct citations
  • Bibliographic coupling
  • Co–citations

Interestingly, different projections of this 2-mode network give the whole resulting 1-mode network a different meaning.

I will illustrate more in detail in the following. The example is absed on some own work.1

Doing it by hand

Lets imagine we do it the hard way. We download some bibliographic data, and have to do all the munging on our own, till we end up with a nice network representation. Lets go through some of these steps together.

Lets get started. I will load some bibliographic data (selection process explained in the paper) on articles concerned with the field of “Innovation Studies”. It already went through some upfront cleaning, but is very similar to what you get when you download data from WoS.

rm(list=ls())
articles <- readRDS(url("https://www.dropbox.com/s/oumm3n0km316im4/publications.RDS?dl=1"))
articles %<>%
  select(SR, AU, TI, JI, PY, AU_UN, DE, TC, NR, CR) %>%
  rename(article = SR,
         author = AU,
         title = TI,
         journal = JI,
         year = PY,
         affiliation = AU_UN,
         keywords = DE,
         citations = TC,
         references = NR,
         reference.list = CR)
articles %>%
  arrange(desc(citations)) %>%
  glimpse()
## Observations: 6,370
## Variables: 10
## $ article        <chr> "BARNEY J, 1991, J MANAGE", "COHEN WM, 1990, AD...
## $ author         <chr> "BARNEY J", "COHEN WM;LEVINTHAL DA", "KOGUT B;Z...
## $ title          <chr> "FIRM RESOURCES AND SUSTAINED COMPETITIVE ADVAN...
## $ journal        <chr> "J. MANAGE.", "ADM. SCI. Q.", "ORGAN SCI.", "AC...
## $ year           <dbl> 1991, 1990, 1992, 1998, 1996, 1990, 1993, 1990,...
## $ affiliation    <chr> "TEXAS AANDM UNIV SYST", "CARNEGIE MELLON UNIV;...
## $ keywords       <chr> NA, NA, "ORGANIZATIONAL KNOWLEDGE; TECHNOLOGY T...
## $ citations      <dbl> 14541, 11098, 4716, 4087, 3310, 2871, 2643, 200...
## $ references     <dbl> 53, 47, 34, 52, 34, 44, 48, 75, 6, 17, 21, 61, ...
## $ reference.list <chr> "ANDREWS K 1971 CONCEPT CORPORATE ST;ANSOFF H 1...

So, where are the links to the references? Its a bit messy, they are all found in the CRF field, sepperated by ;.

articles[1, "reference.list"]

I will now transfere them to an article \(\rightarrow\) reference edgelist. Since its a lot of data, I will here use the data.table package functionality. I usually avoid it, because I hate the syntax. However, its just way faster, and when working with large bibliometric corpus that matters.

citation.el <- data.table(article = articles$article, 
                          str_split_fixed(articles$reference.list, ";", max(articles$references, na.rm=T))) 

citation.el <- melt(citation.el, id.vars = "article")[, variable:= NULL][value!=""]

citation.el %<>%
  rename(reference = value) %>%
  arrange(article,reference)
citation.el %>% head()

Likewise, I will transfer this into a sparse 2-mode matrix. I amke it sparse because its way more efficient.

library(Matrix)
mat <- spMatrix(nrow=length(unique(citation.el$article)),
                ncol=length(unique(citation.el$reference)),
                i = as.numeric(factor(citation.el$article)),
                j = as.numeric(factor(citation.el$reference)),
                x = rep(1, length(as.numeric(citation.el$article))) ) 
row.names(mat) <- levels(factor(citation.el$article))
colnames(mat) <- levels(factor(citation.el$reference))
mat %>% str()
## Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
##   ..@ i       : int [1:244252] 0 0 0 0 0 0 0 0 0 0 ...
##   ..@ j       : int [1:244252] 10526 14911 14934 15002 15291 17906 19745 20899 23183 23860 ...
##   ..@ Dim     : int [1:2] 6370 36611
##   ..@ Dimnames:List of 2
##   .. ..$ : chr [1:6370] "(HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG" "AARSTAD J, 2016, RES POLICY" "ABDI M, 2012, J INT BUS STUD" "ABDIH Y, 2006, IMF STAFF PAP" ...
##   .. ..$ : chr [1:36611] "A D 1994 POSTBUREAUCRATIC ORG" "A W 1998 MANAGING TOTAL QUALI" "AAGE T 2004 DAN RES UN IND DYN D" "AAGE T 2006 THESIS COPENHAGEN BU" ...
##   ..@ x       : num [1:244252] 1 1 1 1 1 1 1 1 1 1 ...
##   ..@ factors : list()

Here again, I use a efficient way to create the 1-mode projection. This is done by taking the matrix, and taking the dotproduct of its pransposed version (m %*% t(m)). For the one that still remember some matrix algebra, that will sound familiar.

mat.art <- tcrossprod(mat)
# mat.ref <- crossprod(mat)
rm(mat)

So far so good, lets put it in a graph. I also set the attributes right away.

g <- graph_from_adjacency_matrix(mat.art, 
                                 mode = "undirected", 
                                 weighted = T, 
                                 diag = F)
# Note: The gfraph creation with the original `igraph` functionality, since `tidygraph` up to now has issues with sparse matrices.
rm(mat.art)

We now simplify the network.

g <- g %>% simplify(remove.multiple = T, 
                    remove.loops = T, 
                    edge.attr.comb = "sum")

And finally create a tidygraph object.

g <- g %>% as_tbl_graph()
g
## # A tbl_graph: 6370 nodes and 3801377 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 6,370 x 1 (active)
##   name                                              
##   <chr>                                             
## 1 (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG
## 2 AARSTAD J, 2016, RES POLICY                       
## 3 ABDI M, 2012, J INT BUS STUD                      
## 4 ABDIH Y, 2006, IMF STAFF PAP                      
## 5 ABDULLAH M, 2016, REV MANAG SCI                   
## 6 ABEBE GK, 2013, AGRIC SYST                        
## # ... with 6,364 more rows
## #
## # Edge Data: 3,801,377 x 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     3      1
## 2     1     6      1
## 3     1    38      1
## # ... with 3.801e+06 more rows

La voila, we can start the analysis. However, the rest you by now know, so I will skip that for now. Instead, I will show you how to do that all way more convenient.

rm(list=ls())

Fun with the bibliometrix package

Since lately, the bibliometrix package became exteremly good, and by now almost suitable to replace my hand-made workflows. So, I will spare you the data munging, and demonstrate how to use the nice inbuild functionalities here. By doing so, you will develop a lot of intuition on network projection, and aggregation on different levels.

library(bibliometrix)

Loading the data

So, lets load some data. Since it is the topic of this lecture series, why not do a bibliographic mapping of “Innovation system” and “innovation ecosystem”" literature. Here I use the web of science database on scientific literature. I here downloaded the following query.

  • Data source: Clarivate Analytics Web of Science (http://apps.webofknowledge.com)
  • Data format: bibtex
  • Query: TOPIC: (“innovation system” OR “systems of innovation” OR “innovation ecosystem”)
  • Timespan: the beginning of time - March 2019
  • Document Type: Articles
  • Language: English
  • Query data: March, 2019
  • Selection: 1000 most cited

We now just read the plain data with the inbuild convert2df() function

M <- readFiles("https://www.dropbox.com/s/2jh33ktj3ox7ztu/biblio_nw1.txt?dl=1") 
M %<>%
  convert2df(dbsource = "isi",
             format = "plaintext")
## 
## Converting your isi collection into a bibliographic dataframe
## 
## Articles extracted   100 
## Articles extracted   200 
## Articles extracted   300 
## Articles extracted   400 
## Articles extracted   500 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!
M %>% glimpse()
## Observations: 500
## Variables: 64
## $ PT       <chr> "J", "J", "J", "J", "J", "J", "J", "J", "J", "J", "J"...
## $ AU       <chr> "RUBINOV M;SPORNS O", "LANGFELDER P;HORVATH S", "SMIT...
## $ AF       <chr> "RUBINOV, MIKAIL; SPORNS, OLAF", "LANGFELDER, PETER; ...
## $ TI       <chr> "COMPLEX NETWORK MEASURES OF BRAIN CONNECTIVITY: USES...
## $ SO       <chr> "NEUROIMAGE", "BMC BIOINFORMATICS", "PROCEEDINGS OF T...
## $ LA       <chr> "ENGLISH", "ENGLISH", "ENGLISH", "ENGLISH", "ENGLISH"...
## $ DT       <chr> "ARTICLE", "ARTICLE", "ARTICLE", "ARTICLE", "ARTICLE"...
## $ ID       <chr> "STATE FUNCTIONAL CONNECTIVITY; GRAPH-THEORETICAL ANA...
## $ AB       <chr> "BRAIN CONNECTIVITY DATASETS COMPRISE NETWORKS OF BRA...
## $ C1       <chr> "[SPORNS, OLAF] INDIANA UNIV, DEPT PSYCHOL & BRAIN SC...
## $ RP       <chr> "SPORNS, O (REPRINT AUTHOR), INDIANA UNIV, DEPT PSYCH...
## $ EM       <chr> "OSPORNS@INDIANA.EDU", "PETER.LANGFELDER@GMAIL.COM; S...
## $ RI       <chr> "SPORNS, OLAF/A-1667-2010", "KARAYEL, BORA/E-2173-201...
## $ OI       <chr> "SPORNS, OLAF/0000-0001-7265-4036", "KARAYEL, BORA/00...
## $ FU       <chr> "J.S. MCDONNELL FOUNDATION [JSMF22002082]; CSIRO ICT ...
## $ FX       <chr> "WE THANK ROLF KOTTER, PATRIC HAGMANN, AVIAD RUBINSTE...
## $ CR       <chr> "ACHARD S, 2006, J NEUROSCI, V26, P63, DOI 10.1523/JN...
## $ NR       <chr> "69", "48", "38", "30", "94", "37", "34", "18", "42",...
## $ TC       <dbl> 2848, 2152, 2004, 1790, 1274, 752, 703, 682, 672, 601...
## $ Z9       <chr> "2911", "2175", "2021", "1815", "1304", "763", "716",...
## $ U1       <chr> "35", "38", "12", "42", "11", "9", "11", "11", "4", "...
## $ U2       <chr> "393", "39", "208", "431", "130", "151", "150", "123"...
## $ PU       <chr> "ACADEMIC PRESS INC ELSEVIER SCIENCE", "BIOMED CENTRA...
## $ PI       <chr> "SAN DIEGO", "LONDON", "WASHINGTON", "LONDON", "WASHI...
## $ PA       <chr> "525 B ST, STE 1900, SAN DIEGO, CA 92101-4495 USA", "...
## $ SN       <chr> "1053-8119", "1471-2105", "0027-8424", "0028-0836", "...
## $ EI       <chr> "1095-9572", NA, NA, "1476-4687", NA, "1476-4687", NA...
## $ J9       <chr> "NEUROIMAGE", "BMC BIOINFORMATICS", "P NATL ACAD SCI ...
## $ JI       <chr> "NEUROIMAGE", "BMC BIOINFORMATICS", "PROC. NATL. ACAD...
## $ PD       <chr> "SEP", "DEC 29", "AUG 4", "NOV 1", "FEB 11", "JUN 16"...
## $ PY       <dbl> 2010, 2008, 2009, 2012, 2009, 2011, 2013, 2009, 2009,...
## $ VL       <chr> "52", "9", "106", "491", "29", "474", "45", "106", "3...
## $ IS       <chr> "3", NA, "31", "7422", "6", "7351", "1", "36", NA, "1...
## $ BP       <chr> "1059", NA, "13040", "119", "1860", "380", "25", "152...
## $ EP       <chr> "1069", NA, "13045", "124", "1873", "+", "U52", "1527...
## $ DI       <chr> "10.1016/J.NEUROIMAGE.2009.10.003", "10.1186/1471-210...
## $ PG       <chr> "11", "13", "6", "6", "14", "2", "11", "5", "7", "29"...
## $ WC       <chr> "NEUROSCIENCES; NEUROIMAGING; RADIOLOGY, NUCLEAR MEDI...
## $ SC       <chr> "NEUROSCIENCES & NEUROLOGY; RADIOLOGY, NUCLEAR MEDICI...
## $ GA       <chr> "629FY", "402FP", "479NT", "028PM", "406NC", "777TD",...
## $ UT       <chr> "ISI000280181800027", "ISI000262999900002", "ISI00026...
## $ PM       <chr> "19819337", "19114008", "19620724", "23128233", "1921...
## $ DA       <chr> "2018-10-04", "2018-10-04", "2018-10-04", "2018-10-04...
## $ ER       <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "...
## $ AR       <chr> NA, "559", NA, NA, NA, NA, NA, NA, NA, NA, NA, "E1000...
## $ OA       <chr> NA, "GOLD", "GOLD_OR_BRONZE", "GREEN_ACCEPTED", "GOLD...
## $ DE       <chr> NA, NA, "BRAIN CONNECTIVITY; BRAINMAP; FMRI; FUNCTION...
## $ CA       <chr> NA, NA, NA, "INT IBD GENETICS CONSORTIUM IIBDGC", NA,...
## $ SU       <chr> NA, NA, NA, NA, NA, NA, NA, NA, "S", NA, NA, NA, NA, ...
## $ BE       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ SE       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ BN       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ PN       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ CT       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ CY       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ CL       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ SP       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ SI       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ DB       <chr> "ISI", "ISI", "ISI", "ISI", "ISI", "ISI", "ISI", "ISI...
## $ AU_UN    <chr> "INDIANA UNIV;UNIV NEW S WALES;UNIV NEW S WALES;QUEEN...
## $ AU1_UN   <chr> "INDIANA UNIV", "UNIV CALIF LOS ANGELES", "UNIV OXFOR...
## $ AU_UN_NR <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ SR_FULL  <chr> "RUBINOV M, 2010, NEUROIMAGE", "LANGFELDER P, 2008, B...
## $ SR       <chr> "RUBINOV M, 2010, NEUROIMAGE", "LANGFELDER P, 2008, B...

To figure out what the field mean, check the WoS fieldtags.

Descriptive Analysis

Although bibliometrics is mainly known for quantifying the scientific production and measuring its quality and impact, it is also useful for displaying and analysing the intellectual, conceptual and social structures of research as well as their evolution and dynamical aspects.

In this way, bibliometrics aims to describe how specific disciplines, scientific domains, or research fields are structured and how they evolve over time. In other words, bibliometric methods help to map the science (so-called science mapping) and are very useful in the case of research synthesis, especially for the systematic ones.

Bibliometrics is an academic science founded on a set of statistical methods, which can be used to analyze scientific big data quantitatively and their evolution over time and discover information. Network structure is often used to model the interaction among authors, papers/documents/articles, references, keywords, etc.

Bibliometrix is an open-source software for automating the stages of data-analysis and data-visualization. After converting and uploading bibliographic data in R, Bibliometrix performs a descriptive analysis and different research-structure analysis.

Descriptive analysis provides some snapshots about the annual research development, the top “k” productive authors, papers, countries and most relevant keywords.

Main findings about the collection

results <- biblioAnalysis(M)
summary(results, 
        k = 20, 
        pause = F)
## 
## 
## Main Information about data
## 
##  Documents                             500 
##  Sources (Journals, Books, etc.)       268 
##  Keywords Plus (ID)                    2480 
##  Author's Keywords (DE)                1200 
##  Period                                2008 - 2016 
##  Average citations per documents       150.6 
## 
##  Authors                               3562 
##  Author Appearances                    3889 
##  Authors of single-authored documents  27 
##  Authors of multi-authored documents   3535 
##  Single-authored documents             28 
## 
##  Documents per Author                  0.14 
##  Authors per Document                  7.12 
##  Co-Authors per Documents              7.78 
##  Collaboration Index                   7.49 
##  
##  Document types                     
##  ARTICLE                             478 
##  ARTICLE; BOOK CHAPTER               4 
##  ARTICLE; PROCEEDINGS PAPER          17 
##  ARTICLE; RETRACTED PUBLICATION      1 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     2008       65
##     2009       92
##     2010       83
##     2011       79
##     2012       66
##     2013       38
##     2014       40
##     2015       27
##     2016       10
## 
## Annual Percentage Growth Rate -20.86186 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles Authors        Articles Fractionalized
## 1   HORVATH S           20  HORVATH S                        3.88
## 2   GESCHWIND DH        12  LEYDESDORFF L                    2.33
## 3   LANGFELDER P         8  DEARING JW                       2.00
## 4   MILLER JA            7  LANGFELDER P                     1.92
## 5   HE Y                 6  GESCHWIND DH                     1.66
## 6   BORSBOOM D           5  BODIN O                          1.50
## 7   COPPOLA G            5  BOSCHMA R                        1.50
## 8   ZHANG B              5  DAWSON S                         1.50
## 9   BASSETT DS           4  DING Y                           1.50
## 10  BULLMORE ET          4  ERNSTSON H                       1.33
## 11  CHO JH               4  INGOLD K                         1.33
## 12  GAO FY               4  JORDAN F                         1.25
## 13  KNIGHT R             4  BRANDES U                        1.17
## 14  LEYDESDORFF L        4  BLUTHGEN N                       1.14
## 15  MENON V              4  BORSBOOM D                       1.13
## 16  MILL J               4  MILLER JA                        1.13
## 17  OLDHAM MC            4  SCHENSUL JJ                      1.09
## 18  OPHOFF RA            4  MENON V                          1.07
## 19  SAITO K              4  HE Y                             1.06
## 20  SMITH SM             4  ASHTON W                         1.00
## 
## 
## Top manuscripts per citations
## 
##                            Paper            TC TCperYear
## 1  RUBINOV M, 2010, NEUROIMAGE            2848     316.4
## 2  LANGFELDER P, 2008, BMC BIOINFORMATICS 2152     195.6
## 3  SMITH SM, 2009, P NATL ACAD SCI USA    2004     200.4
## 4  JOSTINS L, 2012, NATURE                1790     255.7
## 5  BUCKNER RL, 2009, J NEUROSCI           1274     127.4
## 6  VOINEAGU I, 2011, NATURE                752      94.0
## 7  DELOUKAS P, 2013, NAT GENET             703     117.2
## 8  EAGLE N, 2009, P NATL ACAD SCI USA      682      68.2
## 9  CHEN J, 2009, NUCLEIC ACIDS RES         672      67.2
## 10 THIELE I, 2010, NAT PROTOC              601      66.8
## 11 FRANSSON P, 2008, NEUROIMAGE            572      52.0
## 12 SUPEKAR K, 2008, PLOS COMPUT BIOL       539      49.0
## 13 XUE J, 2014, IMMUNITY                   531     106.2
## 14 FOWLER JH, 2008, BRIT MED J             503      45.7
## 15 MILL J, 2008, AM J HUM GENET            480      43.6
## 16 BAILEY P, 2016, NATURE                  452     150.7
## 17 AIROLDI EM, 2008, J MACH LEARN RES      443      40.3
## 18 SUPEKAR K, 2009, PLOS BIOL              413      41.3
## 19 BARBERAN A, 2012, ISME J                383      54.7
## 20 GARDY JL, 2011, NEW ENGL J MED          369      46.1
## 
## 
## Corresponding Author's Countries
## 
##           Country Articles    Freq SCP MCP MCP_Ratio
## 1  USA                 231 0.46293 161  70     0.303
## 2  CHINA                35 0.07014  18  17     0.486
## 3  UNITED KINGDOM       34 0.06814  16  18     0.529
## 4  NETHERLANDS          27 0.05411  15  12     0.444
## 5  GERMANY              26 0.05210  14  12     0.462
## 6  CANADA               20 0.04008   9  11     0.550
## 7  ITALY                18 0.03607   7  11     0.611
## 8  AUSTRALIA            16 0.03206   6  10     0.625
## 9  SPAIN                11 0.02204   3   8     0.727
## 10 SWEDEN               11 0.02204   6   5     0.455
## 11 SWITZERLAND          10 0.02004   6   4     0.400
## 12 FRANCE                7 0.01403   4   3     0.429
## 13 KOREA                 7 0.01403   4   3     0.429
## 14 JAPAN                 6 0.01202   6   0     0.000
## 15 BELGIUM               5 0.01002   1   4     0.800
## 16 AUSTRIA               4 0.00802   2   2     0.500
## 17 IRELAND               4 0.00802   2   2     0.500
## 18 FINLAND               3 0.00601   1   2     0.667
## 19 BRAZIL                2 0.00401   0   2     1.000
## 20 CUBA                  2 0.00401   1   1     0.500
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  USA                      39460                     170.8
## 2  UNITED KINGDOM            7023                     206.6
## 3  CHINA                     3819                     109.1
## 4  CANADA                    3440                     172.0
## 5  GERMANY                   3344                     128.6
## 6  NETHERLANDS               3132                     116.0
## 7  AUSTRALIA                 2128                     133.0
## 8  ITALY                     2046                     113.7
## 9  SWEDEN                    1502                     136.5
## 10 SPAIN                     1265                     115.0
## 11 SWITZERLAND               1141                     114.1
## 12 JAPAN                     1002                     167.0
## 13 FRANCE                     801                     114.4
## 14 KOREA                      735                     105.0
## 15 IRELAND                    650                     162.5
## 16 AUSTRIA                    540                     135.0
## 17 BELGIUM                    389                      77.8
## 18 GREECE                     384                     192.0
## 19 FINLAND                    324                     108.0
## 20 INDIA                      280                     140.0
## 
## 
## Most Relevant Sources
## 
##                                                                     Sources        Articles
## 1  PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA       25
## 2  PLOS ONE                                                                              22
## 3  NEUROIMAGE                                                                            15
## 4  NATURE                                                                                10
## 5  ISME JOURNAL                                                                           9
## 6  NUCLEIC ACIDS RESEARCH                                                                 9
## 7  CELL                                                                                   7
## 8  GENOME RESEARCH                                                                        7
## 9  BIOINFORMATICS                                                                         6
## 10 BMC BIOINFORMATICS                                                                     6
## 11 PLOS GENETICS                                                                          6
## 12 BRAIN                                                                                  5
## 13 CANCER RESEARCH                                                                        5
## 14 JOURNAL OF INFORMETRICS                                                                5
## 15 MOLECULAR SYSTEMS BIOLOGY                                                              5
## 16 BMC GENOMICS                                                                           4
## 17 DECISION SUPPORT SYSTEMS                                                               4
## 18 EXPERT SYSTEMS WITH APPLICATIONS                                                       4
## 19 JOURNAL OF NEUROSCIENCE                                                                4
## 20 LANDSCAPE AND URBAN PLANNING                                                           4
## 
## 
## Most Relevant Keywords
## 
##    Author Keywords (DE)      Articles  Keywords-Plus (ID)     Articles
## 1   SOCIAL NETWORK ANALYSIS        43 NETWORK ANALYSIS              41
## 2   NETWORK ANALYSIS               41 EXPRESSION                    32
## 3   GRAPH THEORY                   14 GENE EXPRESSION               29
## 4   SOCIAL NETWORKS                13 NETWORKS                      26
## 5   SYSTEMS BIOLOGY                10 ORGANIZATION                  25
## 6   FUNCTIONAL CONNECTIVITY         9 IDENTIFICATION                24
## 7   CONNECTIVITY                    7 COMPLEX NETWORKS              22
## 8   FMRI                            7 CENTRALITY                    21
## 9   NETWORK                         7 DISEASE                       21
## 10  CENTRALITY                      6 DYNAMICS                      20
## 11  RESTING STATE                   6 PATTERNS                      17
## 12  TRACTOGRAPHY                    6 ALZHEIMERS DISEASE            16
## 13  CLUSTERING                      5 EVOLUTION                     16
## 14  MICROARRAY                      5 MODEL                         16
## 15  NETWORKS                        5 COMMUNITY STRUCTURE           15
## 16  COMMUNITY                       4 ESCHERICHIA COLI              15
## 17  COMPLEX NETWORKS                4 FUNCTIONAL CONNECTIVITY       15
## 18  DIFFUSION TENSOR IMAGING        4 PERFORMANCE                   15
## 19  GENE EXPRESSION                 4 BEHAVIOR                      14
## 20  METABOLOMICS                    4 MASS SPECTROMETRY             14
plot(results)

Most Cited References (internally)

CR <- citations(M, 
                field = "article", 
                sep = ";")
cbind(CR$Cited[1:10]) %>% head()
##                                                                            [,1]
## WASSERMAN S, 1994, SOCIAL NETWORK ANAL                                       63
## WATTS DJ, 1998, NATURE, V393, P440, DOI 10.1038/30918                        49
## ZHANG B, 2005, STAT APPL GENET MO B, V4, DOI 10.2202/1544-6115.1128          47
## FREEMAN LC, 1979, SOC NETWORKS, V1, P215, DOI 10.1016/0378-8733(78)90021-7   42
## LANGFELDER P, 2008, BMC BIOINFORMATICS, V9, DOI 10.1186/1471-2105-9-559      37
## SHANNON P, 2003, GENOME RES, V13, P2498, DOI 10.1101/GR.1239303              29

Bibliographic Copling Analysis: The Knowledge Frontier of the Field

Bibliographic coupling is a newer technique, which has turned out to be very appropriate to capture a fields current knowledge frontier. I will show you how to do it here, but in case you are interested, read my paper :)

NetMatrix <- biblioNetwork(M, 
                           analysis = "coupling", 
                           network = "references", 
                           sep = ";")
net <-networkPlot(NetMatrix, 
            n = 50, 
            Title = "Bibliographic Coupling Network", 
            type = "fruchterman", 
            size.cex = TRUE, 
            size = 20, 
            remove.multiple = FALSE, 
            labelsize = 0.7,
            edgesize = 10, 
            edges.min = 5)

Co-citation Analysis: The Intellectual Structure and Knowledge Bases of the field

Citation analysis is one of the main classic techniques in bibliometrics. It shows the structure of a specific field through the linkages between nodes (e.g. authors, papers, journal), while the edges can be differently interpretated depending on the network type, that are namely co-citation, direct citation, bibliographic coupling.

Below there are three examples.

  • First, a co-citation network that shows relations between cited-reference works (nodes).
  • Second, a co-citation network that uses cited-journals as unit of analysis. The useful dimensions to comment the co-citation networks are: (i) centrality and peripherality of nodes, (ii) their proximity and distance, (iii) strength of ties, (iv) clusters, (iiv) bridging contributions.
  • Third, a historiograph is built on direct citations. It draws the intellectual linkages in a historical order. Cited works of thousands of authors contained in a collection of published scientific articles is sufficient for recostructing the historiographic structure of the field, calling out the basic works in it.

Co-citation (cited references) analysis

Plot options:

  • n = 50 (the funxtion plots the main 50 cited references)
  • type = “fruchterman” (the network layout is generated using the Fruchterman-Reingold Algorithm)
  • size.cex = TRUE (the size of the vertices is proportional to their degree)
  • size = 20 (the max size of vertices)
  • remove.multiple=FALSE (multiple edges are not removed)
  • labelsize = 0.7 (defines the size of vertex labels)
  • edgesize = 10 (The thickness of the edges is proportional to their strength. Edgesize defines the max value of the thickness)
  • edges.min = 5 (plots only edges with a strength greater than or equal to 5)
  • all other arguments assume the default values
NetMatrix <- biblioNetwork(M, 
                           analysis = "co-citation", 
                           network = "references", 
                           sep = ";")
net <-networkPlot(NetMatrix, 
            n = 50, 
            Title = "Co-Citation Network", 
            type = "fruchterman", 
            size.cex = TRUE, 
            size = 20, 
            remove.multiple = FALSE, 
            labelsize = 0.7,
            edgesize = 10, 
            edges.min = 5)

Cited Journal (Source) co-citation analysis

M <- metaTagExtraction(M, "CR_SO", sep=";")
NetMatrix <- biblioNetwork(M, 
                           analysis = "co-citation", 
                           network = "sources", 
                           sep = ";")
net <-networkPlot(NetMatrix, 
            n = 50, 
            Title = "Co-Citation Network", 
            type = "auto", 
            size.cex = TRUE, 
            size = 15, 
            remove.multiple = FALSE, 
            labelsize = 0.7,
            edgesize = 10, 
            edges.min = 5)

Some summary statistics. I will only provide them here, but theur are availabel for all object created with biblioNetwork()

netstat <- networkStat(NetMatrix)
summary(netstat, k = 10)
## 
## 
## Main statistics about the network
## 
##  Size                                  7562 
##  Density                               0.012 
##  Transitivity                          0.274 
##  Diameter                              6 
##  Degree Centralization                 0.502 
##  Average path length                   2.359 
## 

Note: By the way, the results contain an “hidden” igraph obejct. That is new, and makes further analysis of the results possible. Great!

str(net, max.level = 2)
## List of 6
##  $ graph      :List of 10
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..- attr(*, "class")= chr "igraph"
##  $ graph_pajek:List of 10
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..- attr(*, "class")= chr "igraph"
##  $ cluster_obj:List of 5
##   ..$ merges    : chr [1:4] "soc networks" "social network anal" "annu rev sociol" "am j sociol"
##   ..$ modularity: chr [1:5] "phys rev e" "phys rev lett" "physica a" "siam rev" ...
##   ..$ membership: chr [1:15] "j neurosci" "plos comput biol" "science" "p natl acad sci usa" ...
##   ..$ names     : chr [1:6] "admin sci quart" "acad manage j" "manage sci" "acad manage rev" ...
##   ..$ vcount    : chr [1:20] "bmc syst biol" "nat biotechnol" "nat genet" "bioinformatics" ...
##   ..- attr(*, "class")= chr "communities"
##  $ cluster_res:'data.frame': 50 obs. of  3 variables:
##   ..$ vertex        : Factor w/ 50 levels "acad manage j",..: 47 48 7 5 38 39 40 46 49 20 ...
##   ..$ cluster       : num [1:50] 1 1 1 1 2 2 2 2 2 3 ...
##   ..$ btw_centrality: num [1:50] 8.436 2.965 0.443 4.139 4.116 ...
##  $ layout     : num [1:50, 1:2] -0.373 -0.442 -0.244 -0.338 -0.277 ...
##  $ S          : NULL
net$graph %>% as_tbl_graph()
## # A tbl_graph: 50 nodes and 17836 edges
## #
## # An undirected multigraph with 1 component
## #
## # Node Data: 50 x 6 (active)
##   name                  deg  size label.cex color   community
##   <chr>               <dbl> <dbl>     <dbl> <chr>       <dbl>
## 1 j neurosci           2944  4.02       0.7 #4DAF4A         3
## 2 plos comput biol     3483  4.76       0.7 #4DAF4A         3
## 3 science             10168 13.9        0.7 #4DAF4A         3
## 4 p natl acad sci usa 10982 15          0.7 #4DAF4A         3
## 5 nat rev neurosci     1888  2.58       0.7 #4DAF4A         3
## 6 bmc syst biol        2008  2.74       0.7 #FF7F00         5
## # ... with 44 more rows
## #
## # Edge Data: 17,836 x 6
##    from    to color     lty   num width
##   <int> <int> <chr>   <dbl> <dbl> <dbl>
## 1     1     2 #4DAF4A     1    39  1.70
## 2     1     2 #4DAF4A     1    39  1.70
## 3     1     2 #4DAF4A     1    39  1.70
## # ... with 1.783e+04 more rows

The conceptual structure and context - Co-Word Analysis

Co-word networks show the conceptual structure, that uncovers links between concepts through term co-occurences.

Conceptual structure is often used to understand the topics covered by scholars (so-called research front) and identify what are the most important and the most recent issues.

Dividing the whole timespan in different timeslices and comparing the conceptual structures is useful to analyze the evolution of topics over time.

Bibliometrix is able to analyze keywords, but also the terms in the articles’ titles and abstracts. It does it using network analysis or correspondance analysis (CA) or multiple correspondance analysis (MCA). CA and MCA visualise the conceptual structure in a two-dimensional plot.

We can even do way more fancy stuff with abstracts or full texts (and do so). However, I dont want to spoiler Romans sessions, so I will hold myself back here

Co-word Analysis through Keyword co-occurrences

Plot options:

  • normalize = “association” (the vertex similarities are normalized using association strength)
  • n = 50 (the function plots the main 50 cited references)
  • type = “fruchterman” (the network layout is generated using the Fruchterman-Reingold Algorithm)
  • size.cex = TRUE (the size of the vertices is proportional to their degree)
  • size = 20 (the max size of the vertices)
  • remove.multiple=FALSE (multiple edges are not removed)
  • labelsize = 3 (defines the max size of vertex labels)
  • label.cex = TRUE (The vertex label sizes are proportional to their degree)
  • edgesize = 10 (The thickness of the edges is proportional to their strength. Edgesize defines the max value of the thickness)
  • label.n = 30 (Labels are plotted only for the main 30 vertices)
  • edges.min = 25 (plots only edges with a strength greater than or equal to 2)
  • all other arguments assume the default values
NetMatrix <- biblioNetwork(M, 
                           analysis = "co-occurrences", 
                           network = "keywords", 
                           sep = ";")
# net <- networkPlot(NetMatrix, 
#                    normalize = "association", 
#                    n = 50, 
#                    Title = "Keyword Co-occurrences", 
#                    type = "fruchterman", 
#                    size.cex = TRUE, size = 20, remove.multiple = FALSE, 
#                    edgesize = 10, 
#                    labelsize = 3,
#                    label.cex = TRUE,
#                    label.n = 50,
#                    edges.min = 2)

Co-word Analysis through Correspondence Analysis

You already saw that comming, right?

CS <- conceptualStructure(M, 
                          method = "CA", 
                          field = "ID", 
                          minDegree = 10, 
                          k.max = 8, 
                          stemming = FALSE, 
                          labelsize = 8,
                          documents = 20)

Thematic Map

Co-word analysis draws clusters of keywords. They are considered as themes, whose density and centrality can be used in classifying themes and mapping in a two-dimensional diagram.

Thematic map is a very intuitive plot and we can analyze themes according to the quadrant in which they are placed: (1) upper-right quadrant: motor-themes; (2) lower-right quadrant: basic themes; (3) lower-left quadrant: emerging or disappearing themes; (4) upper-left quadrant: very specialized/niche themes.

Please see Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146-166.

NetMatrix <- biblioNetwork(M, 
                           analysis = "co-occurrences",
                           network = "keywords", 
                           sep = ";")

S <- normalizeSimilarity(NetMatrix, 
                         type = "association")
Map <- thematicMap(M,
                   minfreq =5 )
plot(Map$map)

Lets inspect the clusters we found:

clusters <-Map$words %>%
  arrange(Cluster, desc(Occurrences))

clusters %>%
  select(Cluster, Words, Occurrences) %>%
  group_by(Cluster) %>%
  mutate(n.rel = Occurrences / sum(Occurrences) ) %>%
  slice(1:3)

The social structure - Collaboration Analysis

Collaboration networks show how authors, institutions (e.g. universities or departments) and countries relate to others in a specific field of research. For example, the first figure below is a co-author network. It discovers regular study groups, hidden groups of scholars, and pivotal authors. The second figure is called “Edu collaboration network” and uncovers relevant institutions in a specific research field and their relations.

Author collaboration network

NetMatrix <- biblioNetwork(M, 
                           analysis = "collaboration",  
                           network = "authors", 
                           sep = ";")

S <- normalizeSimilarity(NetMatrix, type = "jaccard")

net <- networkPlot(S,  
                   n = 50, 
                   Title = "Author collaboration",
                   type = "auto", 
                   size = 10,
                   weighted = TRUE,
                   remove.isolates = TRUE,
                   size.cex = TRUE,
                   edgesize = 1,
                   labelsize = 0.6)

Edu collaboration network

NetMatrix <- biblioNetwork(M, 
                           analysis = "collaboration",  
                           network = "universities", 
                           sep = ";")

net <- networkPlot(NetMatrix,  
                   n = 50, 
                   Title = "Edu collaboration",
                   type = "auto", 
                   size = 10,
                   size.cex = T,
                   edgesize = 3,
                   labelsize = 0.6)

Country collaboration network

M <- metaTagExtraction(M, 
                       Field = "AU_CO", 
                       sep = ";")

NetMatrix <- biblioNetwork(M, 
                           analysis = "collaboration",  
                           network = "countries", 
                           sep = ";")

net <- networkPlot(NetMatrix,  
                   n = dim(NetMatrix)[1], 
                   Title = "Country collaboration",
                   type = "sphere", 
                   cluster = "lovain",
                   weighted = TRUE,
                   size = 10,
                   size.cex = T,
                   edgesize = 1,
                   labelsize = 0.6)
## 
## Unknown cluster argument. Using default algorithm

Isn’t that all a lot of fun?

By now you should have realized that different leevel of projection and aggregation offer almost endless possibilities for analysis of ibliographic data!

By the way: We can also do all of that with tidygraph and ggraph

g <- NetMatrix %>% as.matrix() %>% as_tbl_graph(directed = FALSE)
g
## # A tbl_graph: 57 nodes and 461 edges
## #
## # An undirected multigraph with 1 component
## #
## # Node Data: 57 x 1 (active)
##   name          
##   <chr>         
## 1 AUSTRALIA     
## 2 USA           
## 3 UNITED KINGDOM
## 4 FRANCE        
## 5 BELGIUM       
## 6 CANADA        
## # ... with 51 more rows
## #
## # Edge Data: 461 x 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     1     26
## 2     1     2      9
## 3     1     3      5
## # ... with 458 more rows
g <- g %N>%
    mutate(community = as.factor(group_louvain(weights = weight))) 
g %N>%
  mutate(dgr = centrality_degree(weights = weight)) %>%
  arrange(desc(dgr)) %>%
  slice(1:200) %>%
  ggraph(layout = 'fr') + 
  geom_edge_link(aes(width = weight), alpha = 0.2, colour = "grey") + 
  geom_node_point(aes(colour = community, size = dgr)) + 
  geom_node_text(aes(label = name), size = 1, repel = FALSE) +
  theme_graph()

Your turn

Please do Exercise 1 in the corresponding section on Github. This time you are about to do your own bibliographic analysis!

Endnotes

References

More info

You can find more info about:

Session info

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggpubr_0.2.2        randomNames_1.4-0.0 data.table_1.12.2   bibliometrix_2.2.1  Matrix_1.2-17      
##  [6] ggraph_1.0.2        tidygraph_1.1.2     igraph_1.2.4.1      magrittr_1.5        forcats_0.4.0      
## [11] stringr_1.4.0       dplyr_0.8.3         purrr_0.3.2         readr_1.3.1         tidyr_0.8.3        
## [16] tibble_2.1.3        ggplot2_3.2.1       tidyverse_1.2.1     pacman_0.5.1        knitr_1.24         
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-141          lubridate_1.7.4       RColorBrewer_1.1-2    httr_1.4.1            SnowballC_0.6.0      
##  [6] tools_3.6.1           backports_1.1.4       utf8_1.1.4            R6_2.4.0              DT_0.9               
## [11] lazyeval_0.2.2        colorspace_1.4-1      withr_2.1.2           tidyselect_0.2.5      gridExtra_2.3        
## [16] compiler_3.6.1        RISmed_2.1.7          cli_1.1.0             rvest_0.3.4           factoextra_1.0.5     
## [21] flashClust_1.01-2     xml2_1.2.2            labeling_0.3          scales_1.0.0          digest_0.6.20        
## [26] rmarkdown_1.14.3      stringdist_0.9.5.2    pkgconfig_2.0.2       htmltools_0.3.6       FactoMineR_1.42      
## [31] htmlwidgets_1.3       rlang_0.4.0           readxl_1.3.1          rstudioapi_0.10       shiny_1.3.2          
## [36] farver_1.1.0          generics_0.0.2        rscopus_0.6.6         jsonlite_1.6          dendextend_1.12.0    
## [41] leaps_3.0             fansi_0.4.0           Rcpp_1.0.2            munsell_0.5.0         shinycssloaders_0.2.0
## [46] viridis_0.5.1         scatterplot3d_0.3-41  stringi_1.4.3         yaml_2.2.0            MASS_7.3-51.4        
## [51] plyr_1.8.4            grid_3.6.1            parallel_3.6.1        promises_1.0.1        ggrepel_0.8.1        
## [56] crayon_1.3.4          lattice_0.20-38       cowplot_1.0.0         haven_2.1.1           hms_0.5.0            
## [61] zeallot_0.1.0         pillar_1.4.2          ggsignif_0.6.0        reshape2_1.4.3        glue_1.3.1           
## [66] evaluate_0.14         modelr_0.1.5          vctrs_0.2.0           tweenr_1.0.1          httpuv_1.5.1         
## [71] testthat_2.2.1        networkD3_0.4         cellranger_1.1.0      gtable_0.3.0          polyclip_1.10-0      
## [76] assertthat_0.2.1      xfun_0.8              ggforce_0.3.0         mime_0.7              xtable_1.8-4         
## [81] broom_0.5.2           later_0.8.0           viridisLite_0.3.0     shinythemes_1.1.2     cluster_2.1.0        
## [86] toOrdinal_1.1-0.0

  1. Rakas, M., & Hain, D. S. (2019). The state of innovation system research: What happens beneath the surface?. Research Policy, 45 (9). DOI: https://doi.org/10.1016/j.respol.2019.04.011